276 research outputs found
Path integral policy improvement with differential dynamic programming
Path Integral Policy Improvement with Covariance Matrix Adaptation (PI2-CMA) is a step-based model free reinforcement learning approach that combines statistical estimation techniques with fundamental results from Stochastic Optimal Control. Basically, a policy distribution is improved iteratively using reward weighted averaging of the corresponding rollouts. It was assumed that PI2-CMA somehow exploited gradient information that was contained by the reward weighted statistics. To our knowledge we are the first to expose the principle of this gradient extraction rigorously. Our findings reveal that PI2-CMA essentially obtains gradient information similar to the forward and backward passes in the Differential Dynamic Programming (DDP) method. It is then straightforward to extend the analogy with DDP by introducing a feedback term in the policy update. This suggests a novel algorithm which we coin Path Integral Policy Improvement with Differential Dynamic Programming (PI2-DDP). The resulting algorithm is similar to the previously proposed Sampled Differential Dynamic Programming (SaDDP) but we derive the method independently as a generalization of the framework of PI2-CMA. Our derivations suggest to implement some small variations to SaDDP so to increase performance. We validated our claims on a robot trajectory learning task
Information-Theoretic Policy Extraction from Partial Observations
We investigate the problem of extracting a control policy from a single or
multiple partial observation sequences. Therefore we cast the problem as a
Controlled Hidden Markov Model. We then sketch two information-theoretic
approaches to extract a policy which we refer to as A Posterior Control
Distributions. The performance of both methods is investigated and compared
empirically on a linear tracking problem
On entropy regularized Path Integral Control for trajectory optimization
In this article, we present a generalized view on Path Integral Control (PIC) methods. PIC refers to a particular class of policy search methods that are closely tied to the setting of Linearly Solvable Optimal Control (LSOC), a restricted subclass of nonlinear Stochastic Optimal Control (SOC) problems. This class is unique in the sense that it can be solved explicitly yielding a formal optimal state trajectory distribution. In this contribution, we first review the PIC theory and discuss related algorithms tailored to policy search in general. We are able to identify a generic design strategy that relies on the existence of an optimal state trajectory distribution and finds a parametric policy by minimizing the cross-entropy between the optimal and a state trajectory distribution parametrized by a parametric stochastic policy. Inspired by this observation, we then aim to formulate a SOC problem that shares traits with the LSOC setting yet that covers a less restrictive class of problem formulations. We refer to this SOC problem as Entropy Regularized Trajectory Optimization. The problem is closely related to the Entropy Regularized Stochastic Optimal Control setting which is often addressed lately by the Reinforcement Learning (RL) community. We analyze the theoretical convergence behavior of the theoretical state trajectory distribution sequence and draw connections with stochastic search methods tailored to classic optimization problems. Finally we derive explicit updates and compare the implied Entropy Regularized PIC with earlier work in the context of both PIC and RL for derivative-free trajectory optimization
On Entropy Regularized Path Integral Control for Trajectory Optimization
In this article we present a generalised view on Path Integral Control (PIC)
methods. PIC refers to a particular class of policy search methods that are
closely tied to the setting of Linearly Solvable Optimal Control (LSOC), a
restricted subclass of nonlinear Stochastic Optimal Control (SOC) problems.
This class is unique in the sense that it can be solved explicitly to yield a
formal optimal state trajectory distribution. In this contribution we first
review the PIC theory and discuss related algorithms tailored to policy search
in general. We are able to identify a generic design strategy that relies on
the existence of an optimal state trajectory distribution and finds a
parametric policy by minimizing the cross entropy between the optimal and a
state trajectory distribution parametrized through its policy. Inspired by this
observation we then aim to formulate a SOC problem that shares traits with the
LSOC setting yet that covers a less restrictive class of problem formulations.
We refer to this SOC problem as Entropy Regularized Trajectory Optimization.
The problem is closely related to the Entropy Regularized Stochastic Optimal
Control setting which is lately often addressed by the Reinforcement Learning
(RL) community. We analyse the theoretical convergence behaviour of the
theoretical state trajectory distribution sequence and draw connections with
stochastic search methods tailored to classic optimization problems. Finally we
derive explicit updates and compare the implied Entropy Regularized PIC with
earlier work in the context of both PIC and RL for derivative-free trajectory
optimization
Optimizing state trajectories using surrogate models with application on a mechatronic example
The classic design- and simulation methodologies, that are constituting today’s engineer main tools, fall behind with industry’s ever increasing complexity. The strive for technological advancement heralds new performance requirements and optimality remains no longer a concern limited to regime operation. Since the corresponding dynamic optimization problems incorporate accurate system models, the current techniques are plagued by the high computational weight these multi-disciplinary and
highly dimensional system models bear with them. This imbalance advocates for the need to adapt the existing approaches. In this study we propose an algorithmic framework as an extension of the direct transcription method, which has already proven its usefulness concerning this matter. It is suggested to construct a surrogate model of the derivative function that is iteratively refined in a region of interest. Thereafter the method will be illustrated on an academic yet nonlinear example
Polynomial Chaos reformulation in Nonlinear Stochastic Optimal Control with application on a drivetrain subject to bifurcation phenomena
This paper discusses a method enabling optimal control of nonlinear systems
that are subject to parametric uncertainty. A stochastic optimal tracking
problem is formulated that can be expressed in function of the first two
stochastic moments of the state. The proposed formulation allows to penalize
system performance and system robustness independently. The use of polynomial
chaos expansions is investigated to arrive at a computationally tractable
formulation expressing the stochastic moments in function of the polynomial
expansion coefficients rigorously. It is then demonstrated how the stochastic
optimal control problem can be reformulated as a deterministic optimal control
problem in function of these coefficients. The proposed method is applied to
find a robust control input for the start-up of an eccentrically loaded drive
train that is inherently prone to bifurcation behaviour. A reference trajectory
is chosen to deliberately provoke a bifurcation. The proposed framework is able
to avoid the bifurcation behaviour regardlessly.Comment: 7 pages; 5 figures; ICSTCC 2018, 22nd International Conference on
System Theory, Control and Computing. 10 - 12 October. Sinaia - Romani
Polynomial chaos explicit solution of the optimal control problem in model predictive control
A difficulty still hindering the widespread application of Model Predictive Control (MPC) methodologies, remains the computational burden that is related to solving the associated Optimal Control (OC) problem for every control period. In contrast to numerous approximation techniques that pursue acceleration of the online optimization procedure, relatively few work has been devoted towards shifting the optimization effort to a precomputational phase, especially for nonlinear system dynamics. Recently, interest revived in the theory of general Polynomial Chaos (gPC) in order to appraise the influence of variable parameters on dynamic system behaviour and proved to yield reliable results. This article establishes an explicit solution of the multi-parametric Nonlinear Problem (mp-NLP) based on the theoretical framework of gPC, which enabled a polynomial approximated nonlinear feedback law formulation. This resulted in real-time computations allowing for real-time MPC, with corresponding control frequencies up to 2 kHz
Model-based feedforward targeting of magnetic microparticles in fluids using dynamic optimization
External magnetic field gradients originating from electromagnets can generate forces on ferromagnetic microparticles to aid and enable precise local targeting of these particles. To steer these magnetic particles from their initial position to a desired target zone in a fluid, a control strategy on the proper activation of the electromagnets is required. We propose a model-based control strategy that performs dynamical optimization with respect to a given metric that results in an optimal particle trajectory. Here, minimum power consumption of the electromagnet is considered as metric. Furthermore, a dynamical model containing the magnetic fluidic forces acting on the particles is incorporated in the dynamic optimization. Results show the benefits of following the presented approach since it allows control of the electromagnets in open loop
- …